InfoMiner+: Mining Partial Periodic Patterns with Gap Penalties

نویسندگان

  • Jiong Yang
  • Wei Wang
  • Philip S. Yu
چکیده

In this paper, we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. Information gain was proposed to identify patterns with events of vastly different occurrence frequencies and adjust for the deviation from a pattern. However, it does not take any penalty if there exists some gap between the pattern occurrences. In many applications, e.g., bio-informatics, it is important to identify subsequences that a pattern repeats perfectly (or near perfectly). As a solution, we extend the information gain measure to include a penalty for gaps between pattern occurrences. We call this measure as generalized information gain. Furthermore, we want to find subsequence S0 such that for a pattern P , the generalized information gain of P in S0 is high. This is particularly useful in locating repeats in DNA sequences. In this paper, we developed an effective mining algorithm, InfoMiner+, to simultaneously mine significant patterns and the associated subsequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

STAMP: On Discovery of Statistically Important Pattern Repeats in Long Sequential Data

In this paper, we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. In InfoMiner+, we proposed a new metric, namely generalized information gain, to identify patterns with events of vastly different occurrence frequencies and to adjust for the deviation from a pattern. In particular, a penalty is allowed to ...

متن کامل

Efficient Mining of Partial Periodic Patterns in Time Series Database In ICDE 99

Partial periodicity search, i.e., search for partial periodic patterns in time-series databases, is an interesting data mining problem. Previous studies on periodicity search mainly consider finding full periodic patterns, where every point in time contributes (precisely or approximately) to the periodicity. However, partial periodicity is very common in practice since it is more likely that on...

متن کامل

Efficient Mining of Partial Periodic Patterns in Time Series Database

Partial periodicity search, i.e., search for partial periodic patterns in time-series databases, is an interesting data mining problem. Previous studies on periodicity search mainly consider finding full periodic patterns, where every point in time contributes (precisely or approximately) to the periodicity. However, partial periodicity is very common in practice since it is more likely that on...

متن کامل

An Efficient Pruning and Filtering Strategy to Mine Partial Periodic Patterns from a Sequence of Event Sets

Partial periodic patterns are commonly seen in real-world applications. The major problem of mining partial periodic patterns is the efficiency problem due to a huge set of partial periodic candidates. Although some efficient algorithms have been developed to tackle the problem, the performance of the algorithms significantly drops when the mining parameters are set low. In the past, the author...

متن کامل

Comparing Methods of Mining Partial Periodic Patterns in Multidimensional Time Series Databases

Methods to efficiently find patterns in periodic one-dimensional time series databases have been heavily examined in recent data mining research. The issue at hand is whether or not these algorithms can be translated to find such patterns in multidimensional periodic time series dataset by performing classification techniques to reduce the dimensionality. This project will explore two solutions...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002